Back

Research Synthesis Methods

17 training papers 2019-06-25 – 2026-03-07

Top medRxiv preprints most likely to be published in this journal, ranked by match strength.

1
Large language models for abstract screening in systematic- and scoping reviews: A diagnostic test accuracy study
2024-10-02 radiology and imaging 10.1101/2024.10.01.24314702
#1 (25.9%)
Show abstract

IntroductionWe investigated if large language models (LLMs) can be used for abstract screening in systematic- and scoping reviews. MethodsTwo broad reviews were designed: a systematic review structured according to the PRISMA guideline with abstract inclusion based on PICO criteria; and a scoping review, where we defined abstract characteristics and features of interest to look for. For both reviews 500 abstracts were sampled. Two readers independently screened abstracts with disagreements hand...

2
The epidemiology of systematic review updates: a longitudinal study of updating of Cochrane reviews, 2003 to 2018.
2019-12-11 epidemiology 10.1101/19014134
#1 (23.0%)
Show abstract

BackgroundThe Cochrane Collaboration has been publishing systematic reviews in the Cochrane Database of Systematic Reviews (CDSR) since 1995, with the intention that these be updated periodically. ObjectivesTo chart the long-term updating history of a cohort of Cochrane reviews and the impact on the number of included studies. MethodsThe status of a cohort of Cochrane reviews updated in 2003 was assessed at three time points: 2003, 2011, and 2018. We assessed their subject scope, compiled thei...

3
Evaluating Loon Lens Pro, an AI-Driven Tool for Full-Text Screening in Systematic Reviews: A Validation Study
2025-02-14 epidemiology 10.1101/2025.02.11.25322087
#1 (22.0%)
Show abstract

BackgroundSystematic literature reviews (SLRs) are essential for evidence synthesis but are hampered by the resource-intensive full-text screening phase. Loon Lens Pro, a publicly available agentic AI tool, automates full-text screening without prior training by using user-defined inclusion/exclusion criteria and multiple specialized AI agents. This study validated Loon Lens Pro against human reviewers to assess its accuracy, efficiency, and confidence scoring in screening. MethodsIn this compa...

4
Assumptions for creating matrices of evidence to estimate overlap of primary studies in overviews of reviews: Protocol for a meta-research study
2023-06-16 epidemiology 10.1101/2023.06.16.23291488
#1 (21.1%)
Show abstract

IntroductionOverlap of primary studies among systematic reviews (SRs) included in an overview is a major challenge, as it may bias results or artificially increase the precision of the synthesis. Matrices of evidence and corrected covered area (CCA) calculation are recommended methods to manage overlap, but there is little guidance on how to construct these matrices. This research aims to explore variations in the estimation of overlap using CCA matrices under different assumptions. MethodsWe w...

5
Amount and certainty of evidence in Cochrane systematic reviews of interventions: a large-scale meta-research study
2025-12-21 public and global health 10.64898/2025.12.19.25342674
#1 (19.4%)
Show abstract

ObjectivesTo quantify the amount and certainty of evidence in Cochrane systematic reviews of interventions, and to describe how this evidence has evolved over time. DesignLarge-scale meta-research study Data sourceCochrane Database of Systematic Reviews (search date April 8, 2025) Eligibility criteriaCochrane systematic reviews assessing interventions reporting "Summary of findings" tables. Data extractionData were automatically extracted using web scraping and a large language model, with q...

6
Agreeability testing of AMSTAR-PF, a tool for quality appraisal of systematic reviews of prognostic factor studies
2025-04-14 epidemiology 10.1101/2025.04.10.25325555
#1 (19.3%)
Show abstract

BackgroundThis paper details initial testing of the agreeability and usability of a novel quality appraisal tool for systematic reviews of prognostic factor studies: AMSTAR-PF. MethodsFourteen appraisers each assessed eight systematic reviews using AMSTAR-PF. Their ratings for each question and each article were compared, with interrater, inter-pair and intrapair agreeability calculated using Gwets agreement coefficient. Time of use and time to reach consensus were also recorded. ResultsInterr...

7
Accuracy and efficiency of using artificial intelligence for data extraction in systematic reviews. A noninferiority study within reviews
2026-02-27 public and global health 10.64898/2026.02.25.26347053
#1 (19.3%)
Show abstract

BackgroundSystematic reviews are important for informing public health policies and program selection; however, they are time- and resource-intensive. Artificial intelligence (AI) offers a solution to reduce these labour-intensive requirements for various aspects of systematic review production, including data extraction. To date, there is limited robust evidence evaluating the accuracy and efficiency of AI for data extraction. This study within a review (SWAR) aimed to determine whether human d...

8
ChatGPT for assessing risk of bias of randomized trials using the RoB 2.0 tool: A methods study
2023-11-22 epidemiology 10.1101/2023.11.19.23298727
#1 (19.0%)
Show abstract

BackgroundInternationally accepted standards for systematic reviews necessitate assessment of the risk of bias of primary studies. Assessing risk of bias, however, can be time- and resource-intensive. AI-based solutions may increase efficiency and reduce burden. ObjectiveTo evaluate the reliability of ChatGPT for performing risk of bias assessments of randomized trials using the revised risk of bias tool for randomized trials (RoB 2.0). MethodsWe sampled recently published Cochrane systematic ...

9
Automation of Systematic Reviews with Large Language Models
2025-06-13 health informatics 10.1101/2025.06.13.25329541
#1 (19.0%)
Show abstract

Structured AbstractO_ST_ABSImportanceC_ST_ABSSystematic reviews (SRs) inform evidence-based decision making. Yet, many take over a year to complete, are labor intensive, prone to human error, and face reproducibility challenges; thus limiting access to timely and reliable information. ObjectiveTo validate a large language model (LLM)-based workflow (otto-SR) to automate three of the most labour intensive tasks in performing SRs: article screening, data extraction, and risk of bias assessment; a...

10
Stage-wise algorithmic bias, its reporting, and relation to classical systematic review biases in AI-based automated screening in health sciences: A structured literature review
2025-05-16 health informatics 10.1101/2025.05.16.25327774
#1 (18.9%)
Show abstract

IntroductionAlgorithmic bias in systematic reviews that use automatic screening is a major challenge in the application of AI in health sciences. This article presents preliminary findings from the project titled "Identification, Reporting, and Mitigation of Algorithmic Bias in Systematic Reviews with AI-Assisted Screening: Systematic Review and Development of a Checklist for its Evaluation" registered in PROSPERO with the registration number CRD420251036600 (https://www.crd.york.ac.uk/PROSPERO/...

11
Enough evidence and other endings: a descriptive study of stable Cochrane systematic reviews in 2019.
2019-12-09 epidemiology 10.1101/19013912
#1 (18.7%)
Show abstract

BackgroundFrom 2006 to 2019, Cochrane reviews could be designated "stable" if they were not being updated but highly likely to be current. This provides an opportunity to observe practice in ending systematic reviewing and what is regarded as enough evidence. MethodsWe identified Cochrane reviews designated stable in 2013 and 2019 and reasons for this designation. For those with conclusions stated to be so firm that new evidence is unlikely to change them, we assessed conclusions, strength of e...

12
Systematic review automation tool use by systematic reviewers, health technology assessors and clinical guideline developers: tools used, abandoned, and desired
2021-04-30 epidemiology 10.1101/2021.04.26.21255833
#1 (18.6%)
Show abstract

ObjectiveWe investigated the use of systematic review automation tools by systematic reviewers, health technology assessors and clinical guideline developers. Study design and settingsAn online, 16-question survey was distributed across several evidence synthesis, health technology assessment and guideline development organisations internationally. We asked the respondents what tools they use and abandon, how often and when they use the tools, their perceived time savings and accuracy, and desi...

13
Leveraging large language models for systematic reviewing: A case study using HIV medication adherence research
2024-09-19 hiv aids 10.1101/2024.09.18.24313828
#1 (18.5%)
Show abstract

BackgroundThe rapidly accumulating scientific literature in HIV presents a significant challenge in accurately and efficiently assessing the relevant literature. This study explores the potential capabilities of using large language models (LLMs), such as ChatGPT, for selecting relevant studies for a systematic review. MethodScientific papers were initially obtained from bibliographic database searches using a Boolean search strategy with pre-defined keywords. From 15,839 unique records, three ...

14
Changing patterns in reporting and sharing of review data in systematic reviews with meta-analysis: the REPRISE project
2022-04-18 epidemiology 10.1101/2022.04.11.22273688
#1 (18.4%)
Show abstract

ObjectivesTo examine changes in completeness of reporting and frequency of sharing data, analytic code and other review materials in systematic reviews (SRs) over time; and factors associated with these changes. DesignCross-sectional meta-research study. SampleA random sample of 300 SRs with meta-analysis of aggregate data on the effects of a health, social, behavioural or educational intervention, which were indexed in PubMed, Science Citation Index, Social Sciences Citation Index, Scopus and...

15
Sensitivity, specificity and avoidable workload of using a large language models for title and abstract screening in systematic reviews and meta-analyses
2023-12-17 epidemiology 10.1101/2023.12.15.23300018
#1 (18.0%)
Show abstract

ImportanceSystematic reviews are time-consuming and are still performed predominately manually by researchers despite the exponential growth of scientific literature. ObjectiveTo investigate the sensitivity, specificity and estimate the avoidable workload when using an AI-based large language model (LLM) (Generative Pre-trained Transformer [GPT] version 3.5-Turbo from OpenAI) to perform title and abstract screening in systematic reviews. Data SourcesUnannotated bibliographic databases from fiv...

16
boutliers: R package of outlier detection and influence diagnostics for meta-analysis
2025-09-19 health informatics 10.1101/2025.09.18.25336125
#1 (17.7%)
Show abstract

Meta-analysis is an established methodology for evidence synthesis. In practice, substantial heterogeneity often arises among studies, and random-effects models are widely employed as standard tools. However, in many cases of data synthesis, some studies exhibit markedly different characteristics from others, beyond the degree expected from statistical error, and may become influential outliers that affect the overall conclusions. Although outlier detection and influence diagnostic methods have ...

17
Assessment of Bias in Clinical Trials with LLMs Using ROBUST-RCT: A Feasibility Study.
2025-08-13 epidemiology 10.1101/2025.08.12.25333520
#1 (17.6%)
Show abstract

BACKGROUNDBias assessment is a crucial step in evaluating evidence from randomized controlled trials. The widely adopted Cochrane RoB 2, designed to identify these issues, is complex, resource-intensive, and unreliable. Advances in artificial intelligence (AI), particularly in the field of large language models (LLMs), now allow the automation of complex tasks. While prior investigations have focused on whether LLMs could perform assessments with RoB 2, integrating technologies does not resolve ...

18
Agreement between ranking metrics in network meta-analysis: an empirical study
2020-02-12 epidemiology 10.1101/2020.02.11.20021055
#1 (17.6%)
Show abstract

ObjectiveTo empirically explore the level of agreement of the treatment hierarchies from different ranking metrics in network meta-analysis (NMA) and to investigate how network characteristics influence the agreement. DesignEmpirical evaluation from re-analysis of network meta-analyses. Data232 networks of four or more interventions from randomised controlled trials, published between 1999 and 2015. MethodsWe calculated treatment hierarchies from several ranking metrics: relative treatment ef...

19
Accuracy and reliability of data extraction for systematic reviews using large language models: A protocol for a prospective study
2024-05-22 health informatics 10.1101/2024.05.22.24307740
#1 (17.5%)
Show abstract

BackgroundSystematic reviews require extensive time and effort to manually extract and synthesize data from numerous screened studies. This study aims to investigate the ability of large language models (LLMs) to automate data extraction with high accuracy and minimal bias, using clinical questions (CQs) of the Japanese Clinical Practice Guidelines for Management of Sepsis and Septic Shock (J-SSCG) 2024. the study will evaluate the accuracy of three LLMs and optimize their command prompts to enh...

20
Removing animal and nonhuman records in Ovid Embase: A comparison of 11 filters
2026-02-17 health informatics 10.64898/2026.02.13.26346239
#1 (17.1%)
Show abstract

IntroductionSeveral filters are routinely used to remove animal or nonhuman records in Ovid Embase, despite there being no performance data for them. The filters take different approaches in design. ObjectiveTo understand and compare the impact of 11 filters to remove animal or nonhuman records in Ovid Embase. To understand the indexing of relevant subject headings in Embase. MethodsTo assess filter performance, we screened and categorised 3,000 records as should be removed or should be reta...